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Abstract 

Approximate Bayesian Computation (abc for short) is a family of 
computational techniques which offer an almost automated solution 
in situations where evaluation of the posterior likelihood is computa- 
tionally prohibitive, or whenever suitable likelihoods are not available. 
In the present paper, we analyze the procedure from the point of view 
of fc-nearest neighbor theory and explore the statistical properties of 
its outputs. We discuss in particular some asymptotic features of the 
genuine conditional density estimate associated with ABC, which is 
a new interesting hybrid between a fe-nearest neighbor and a kernel 
method. 

Index Terms — Approximate Bayesian Computation, Nonparametric 
estimation, Conditional density estimation, Nearest neighbor meth- 
ods, Mathematical statistics. 
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1 Introduction 



Let Y be a generic random observation which may, for example, take the 
form of a sample of independent and identically distributed (i.i.d.) random 
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variables. More generally, it may also be the first observations of a time series 
or a more complex random object, such as a dna sequence. We denote by 
£(y\0) the distribution (likelihood) of Y, where G MP is an unknown pa- 
rameter that we wish to estimate. In the Bayesian paradigm, the parameter 
itself is seen as a random variable 0, and the likelihood £{y\0) becomes the 
conditional distribution of Y given © = 0. The distribution tc(6) of @ is 
called the prior distribution, while the distribution ir(0\y) of © given Y = y 
is termed posterior. 

When taking a Bayesian perspective, inference about the parameter typ- 
ically proceeds via calculation or simulation of the posterior distribution 
ii(0\y). A variety of methods exist for inference in this context, such as 
rejection algorithms (Ripley j3B]), Markov Chain Monte Carlo (mcmc) meth- 
ods (e.g., the Metropolis-Hastings algorithm, Metropolis et al. [29]; Hastings 
[19]), and Importance Sampling (Ripley [36]). For a comprehensive introduc- 
tion to the domain, the reader is referred to the monographs by Robert and 
Casella [37] and Robert and Marin [28]. However, in some contexts, com- 
putation of the posterior is problematic, either because the size of the data 
makes the calculation computationally intractable, or because calculation is 
impossible when using realistic models for how the data arises. Thus, despite 
their power and flexibility, mcmc procedures and their variants may prove 
irrelevant in a growing number of contemporary applications involving very 
large dimensions or complicated models. This computational burden typi- 
cally arises in fields such as ecology, population genetics and image analysis, 
just to name a few. 

This difficulty has motivated a drive to more approximate approaches, in par- 
ticular the field of Approximate Bayesian Computation (abc for short). In a 
nutshell, ABC is a family of computational techniques which offer an almost 
automated solution in situations where evaluation of the likelihood is compu- 
tationally prohibitive, or whenever suitable likelihoods are not available. The 
approach was originally mentioned, but not analyzed, by Rubin [JT] in 1984. 
It was further developed in population genetics by Fu and Li [13], Tavare et 
al. [IS], Pritchard et al. [35] and Beaumont et al. [3], who gave the name of 
Approximate Bayesian Computation to a family of likelihood-free inference 
methods. Since its original developments, the ABC paradigm has successfully 
been applied to various scientific areas, ranging from archaeological science 
and ecology to epidemiology, stereology and protein network analysis. There 
are too many references to be included here, but the recent survey by Marin 
et al. [27] offers both a historical and technical review of the domain. 

Before we go into more details on ABC, some more notation is required. We 
assume to be given a statistic S, taking values in M m . It is a function of 
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the original observation Y, with a dimension m typically much smaller than 
the dimension of Y. The statistic S is supposed to admit a conditional den- 
sity f(s\d) with respect to the Lebesgue measure on R m . Note that, strictly 
speaking, we should write S(Y) instead of S. However, since there is no 
ambiguity, we will continue to use the latter notation. As such, the statistic 
S should be understood as a low- dimensional summary of Y. It can be, for 
example, a sufficient statistic for the parameter 0, but not necessarily. As- 
suming that is absolutely continuous with respect to the Lebesgue measure 
on MP, the conditional distribution of given S = s has a density g{0\s) 
which, according to Bayes' rule, takes the form 



is the marginal density of S. Finally, we denote by y the observed realization 
of Y (i.e., the data set), and let s (= s(y )) be the corresponding realization 
of S. Throughout the document, both y and s should be considered as 
fixed quantities. 

In its most common form, the generic ABC algorithm is framed as follows. 

Algorithm 1 Pseudo-code 1 of a generic ABC algorithm 
Require: A positive integer N and a tolerance level e. 
for i = 1 to N do 

Generate 0i from the prior n(0); 
Generate y i from the likelihood £(.\6i). 
end for 

return The 6^s such that ||s(t/j) — s || < e. 



The basic idea behind this formulation is that using a representative enough 
summary statistic S coupled with a small enough tolerance level e should 
produce a good approximation of the posterior distribution. A moment's 
thought reveals that pseudo-code [1] has the flavor of a nonparametric ker- 
nel conditional density estimation procedure, for which e plays the role of a 
bandwidth. This is, for example, the point of view that prevails in the analy- 
sis of Blum jl], who explores the asymptotic bias and variance of kernel-type 
estimates of the posterior density g(.|s ) evaluated over the code outputs. 

However, as made transparent by Marin et al. [27], pseudo-code [fl despite 
its widespread diffusion, does not exactly match what people do in practice. 
A more accurate formulation is the following one: 

Algorithm [T] and Algorithm [2] are dual, in the sense that the number of ac- 
cepted points is fixed in the second and random in the first, while their range 



/(s|g)7T(fl) 

7(8) 



where /(s) 




/( S |0)7T(0)d0 
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Algorithm 2 Pseudo-code 2 of a generic ABC algorithm 
Require: A positive integer N and an integer between 1 and N. 
for i = 1 to N do 

Generate 0i from the prior n(0); 
Generate y i from the likelihood £{.\0j). 
end for 

return The 0^s such that s(y i ) is among the /c^-nearest neighbors of s . 



is random in the second and fixed in the first. In practice, the parameter N 
is chosen to be very large (typically of the order of 10 6 ), while k^ is most 
commonly expressed as a percentile. Thus, for example, the choice N = 10 6 
and a percentile k^/N = 0.1% allow to retain 1000 simulated 0iS. 

From a nonparametric perspective, pseudo-code |2] falls within the broad fam- 
ily of nearest neighbor- type procedures (Fix and Hodges [12], Loftsgaarden 
and Quesenberry [21], Cover [5]). Such procedures have the favor of practi- 
tioners, because they are fast, easy to compute and flexible. For implementa- 
tion, they require only a measure of distance in the sample space, hence their 
popularity as a starting-point for refinement, improvement and adaptation 
to new settings (see for example Devroye et al. [SJ Chapter 19]). In any case, 
it is our belief that ABC should be analyzed in this context, and this is the 
point of view that will be taken in the present article. 

In order to better understand the rationale behind Algorithm El denote 
by (0i, Y"i), . . . , (®jv, Yn) an i-i-d. sample, with common joint distribu- 
tion £(y\0)n(0). This sample is naturally associated with the i.i.d. sequence 
(0 i, Si), . . . , (0jv, Sjv), where each pair has density /(s|0)7r(0). Finally, let 
S(i), . . . , S(fc JV ) be the /c^-nearest neighbors of s among Si, . . . , Sjv, and let 
0(i), . . . , ©(fcjv) be the corresponding 0j's (see Figure [TJfor an illustration in 
dimension m = p = 1). 

With this notation, we see that the generic ABC Algorithm [2] proceeds in two 
steps: 

1. First, simulate (realizations of) an iV-sample (0i, Vi), . . . , (0jv, Y N ); 

2. Seconds, return (realizations of) the variables 0(i), ■ • ■ , ®(k N )- 

This simple observation opens the way to a mathematical analysis of ABC 
via techniques based on nearest neighbors. In fact, despite a growing number 
of practical applications, theoretical results guaranteeing the validity of the 
approach are still lacking, with the notable exception of the paper by Blum 
[1]. Our present contribution is twofold. 
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Figure 1: Illustration of ABC in dimension m = p = 1 (d(k N ) = ||S(fe N ) — Sq||). 



(i) We offer in Section 2 an explicit result regarding the distribution of 
the algorithm outputs (Om, Sm), . . . , (®(k N ), S(k N ))- m a nutshell, 
Theorem 12.11 reveals that, conditionally on the distance d^ N +i) — 
\\S(k N +i) — s o||) the simulated data set may be regarded as k N i.i.d. re- 
alizations of the joint density of (©, S) restricted to the ball centered 
at s with radius d(k N +i)- This result is important since, up to our 
knowledge, no such general conclusion is available in the literature. It 
gives a precise description of the output distribution of ABC Algorithm 

m 

(ii) For a fixed So € M m , the estimate practitioners use most to infer the 
posterior density g(.|s ) at some point #o £ K p is 

where {h N } is a sequence of positive real numbers (bandwidth) and K 
is a nonnegative Borel measurable function (kernel) on W . The idea is 
simple: In order to estimate the posterior, just look at the /c^v-nearest 
neighbors of So and smooth the corresponding 0/s around #o- It should 
be noted that (11.11) is nothing but a smart hybrid between a /c-nearest 
neighbor and a kernel density estimation procedure. It is different from 
the Rosenblatt-type kernel conditional density estimates proposed 
in Beaumont et al. [3] and further explored by Blum jl]. In Section 3 
and Section 4, we establish some consistency properties of this genuine 
estimate and discuss its rates of convergence. 
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For the sake of clarity, proofs are postponed to Section 5 and Section 6. An 
appendix at the end of the paper offers some new results on convolution and 
approximation of the identity. 

To conclude this introduction, we would like to make a few comments on the 
topics that will not be addressed in the present document. An important part 
of the performance of the ABC approach, especially for high- dimensional data 
sets, relies upon a good choice of the summary statistic S. In many practical 
applications, this statistic is picked by an expert in the field, without any 
particular guarantee of success. A systematic approach to choosing such 
a statistic, based upon a sound theoretical framework, is currently under 
active investigation in the Bayesian community. This important issue will 
not be pursued further here. As a good starting point, the interested reader 
is referred to Joyce and Marjoram [22], who develop a sequential scheme 
for scoring statistics according to whether their inclusion in the analysis will 
substantially improve the quality of inference. Similarly, we will not address 
issues regarding how to enhance efficiency of ABC and its variants, as for 
example with the sequential techniques of Sisson et al. [12] and Beaumont et 
al. [2]. Nor won't we explore the important question of ABC model choice, 
for which theoretical arguments are still missing (Robert et al. [38], Marin et 
al. [25]). 

2 Distribution of ABC outputs 

We continue to use the notation of Section 1 and recall in particular that 
i, Si), . . . , (0jv, Stv) are i.i.d. MP x IR m -valued random variables, with com- 
mon probability density f(6,s) = f(s\0)7r(0). Both M p (the space of @i's) 
and M m (the space of Si's) are equipped with the Euclidean norm ||.||. In this 
section, attention is focused on analyzing the distribution of the algorithm 
outputs (©(i), S(i)), . . . , (Q(fejv), S(fcar))- 

In what follows, we keep So fixed and denote by dj the (random) distance 
between s and Sj. (To be rigorous, we should write dj(s ), but since no 
confusion can arise we write it simply c?j.) Similarly, we let dm be the distance 
between s and its ith nearest neighbor among Si, ... , Sjv, that is 

<%) = ||S(j) — s || . 

(If distance ties occur, a tie-breaking strategy must be defined. For exam- 
ple, if ||Sj — s || = ||Sj — s ||, Sj may be declared "closer" if i < j, i.e., the 
tie-breaking is done by indices. Note however that ties occur with probabil- 
ity since all random variables are absolutely continuous.) Finally, we let 
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£> m (s , 5) denote the closed ball in K m centered at s with nonnegative radius 
5, i.e., B m (s ,5) = {s e IR' m : ||s — s || < 5}. It is assumed throughout the 
paper that N > 2 and 1 < k N < N - 1. 

Rearranging the /cat (ordered) statistics (0(i), Sm), . . . , (©(k^), S(fc JV )) in the 
original order of their outcome, one obtains the (non-ordered) random 
variables (0^, S*), . . . , (@]L., S£ ). Our first theorem is concerned with the 
conditional distributions 

c{(@\,s\),...,(&i N ,st N )\d ikN+1) } 

and 

£ {(0(1)> S (l))> • • • > (0(*w)> S (feiv)) I d (k N +l)} ■ 

Recall that the collection of all s G M m with J B , „ /(s)ds > for all 5 > 
is called the support of /. 

Theorem 2.1 (Distribution of ABC outputs) Assume that s belongs to 
the support of f . Let (0i, Si), . . . , (®k N , Sk N ) be i.i.d. random variables, with 
common probability density (conditional on d^ N +i)) 



(2.1) 



/(0,s)d0ds 

Rp JB m (s ,d (kN+1) ) 

Then 

£ {(0*, S*), . . . , (&l N , S* kN ) | d {kN+1) ] = £ | (0i, Si), . . . , {&k N , Sfc A 



Moreover 



£ {(0(i), S(i)), . . . , (®(k N ), S( fcjv) ) | d {kN+1 )} 
= £ |(0 ( i), S(i)), . . . , (0( fcjv ), S( few )) J . 

Note, since s belongs by assumption to the support of /, that the constant 
Cd {k +1) of Theorem 12.11 is positive. This theorem may be regarded as an 
extension of a result of Kaufmann and Reiss [23] , who provide explicit repre- 
sentations of the conditional distribution of an empirical point process given 
some order statistics. However, the present Bayesian setting is not covered 
by the conclusions of [23J, and our proof actually relies on much simpler 
arguments. 



The main message of Theorem 12.11 is that, conditionally on d^ N +i), one 
can consider the fc^-uple (0m, Sm), . . . , S(k N )) as an ordered sample 
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drawn according to the probability density (12. ip . Alternatively, the (un- 
ordered) simulated values may be treated like i.i.d. realizations of variables 
with common density proportional to l[|| s _ So ||<d { s). Conditionally 

on d(k N +i), the accepted 6/s are nothing but i.i.d. realizations of the proba- 
bility density 



Although this conclusion is intuitively clear, its proof requires a careful math- 
ematical analysis. 

As will be made transparent in the next section, Theorem 12.11 plays a key 
role in the mathematical analysis of the natural conditional density estimate 
associated with ABC methodology. In fact, investigating ABC in terms of 
nearest neighbors has other important consequences. Suppose, for exam- 
ple, that we are interested in estimating some finite conditional expectation 
E[(/?(©)|S = So], where the random variable <f(Q) is bounded. This includes 
in particular the important setting where (p is polynomial and one wishes to 
estimate the conditional moments of 0. Then, provided k^/ log log N — > oo 
and k^/N — > as N — > oo, it can be shown that for almost all s (with 
respect to the distribution of S), with probability 1, 



Proof of such a result uses the full power of the vast and rich nearest neighbor 
estimation theory. To be more precise, let us make a quick detour through 
this theory and consider an i.i.d. sample (X 1; Zi), . . . , (X w , Z N ) taking values 
in M m x M, where the output variables Zj's are bounded. Assume, to keep 
things simple, that the Xj's have a probability density and that our goal 
is to assess the regression function r(x) = WyZ\ | Xi = x], x G W 1 . In this 
context, the fc-nearest neighbor regression function estimate of r (Royall |40j, 
Cover [5], Stone [H]) takes the form 



where Z(j) is the Z-observation corresponding to Xy), the jth-closest point 
to x among Xi, . . . , Xjy. Denoting by \i the distribution of Xi, it is proved 





(2.2) 




in 
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in Devroye Theorem 3] that provided k N / log log N — > oo and k N /N — > 0, 
for yU-almost all x, 



This result can be transposed without further effort to our ABC setting via 
the correspondence <£>(©) -H- Z and S <H- X, thereby establishing validity of 
(12. 2p . The decisive step towards that conclusion is accomplished by making 
a connection between ABC and nearest neighbor methodology. We leave it 
to the reader to draw his own conclusions as to further possible utilizations 
of this correspondence. 

3 Mean square error consistency 

As in Section 2, we keep the conditioning vector s fixed and consider the 
i.i.d. sample (<S>i, Si), . . . , (®n, Sjv), where each pair is distributed accord- 
ing to the probability density f(0,s) = f(s\6)n(0) on W x IR m . Based on 
this sample, our new objective is to estimate the posterior density g(6 \s ), 
e MP. This estimation step is an important ingredient of the Bayesian 
analysis, whether this may be for visualization purposes or more involved 
mathematical achievements. 

As exposed in the introduction, the natural ABC-companion estimate of 
g(0 o |s o ) takes the form 



where {h^} is a sequence of positive real numbers (bandwidth) and K is 
a nonnegative Borel measurable function (kernel) on W. (To reduce the 
notational burden, we dropped the dependency of the estimate upon s , 
keeping in mind that s is held fixed.) Kernel estimates were originally 
studied in density estimation by Rosenblatt [39J and Parzen [33], and were 
latter introduced in regression estimation by Nadaraya [321 [33J and Watson 
|46j . The k- nearest neighbor method for density estimation purposes goes 
back to Fix and Hodges p2] and Loftsgaarden and Quesenberry [24| . Kernel 
estimates have been extended to the conditional density setting by Rosenblatt 
|39j . who proceeds by separately inferring the bivariate density f(0,s) of 
(©, S) and the marginal density of S. Rosenblatt's estimate reads 



fiv(x) r(x) with probability 1 as N — > oo. 




(3.1) 
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where L is a kernel in M m , and 5n is the corresponding bandwidth. ABC- 
compatible estimates of this type have been discussed in Beaumont et al. [3] 
and further explored by Blum [I] (additional references for the conditional 
density estimation problem are Hyndman et al. [20], Gyorfi and Kohler [14] . 
Faugeras [11] and the survey of Hansen [17] ) . 

The conditional density estimate we are interested in is different, in the 
sense that it has both the flavor of a fc-nearest neighbor approach (it retains 
only the fcjv-nearest neighbors of s among Si, ... , S n ) and a kernel method 
(it smoothes the corresponding 0/s). This is the only conditional density 
estimate of this type we are aware of. Obviously, the main advantage of 
( 13.11) over its kernel-type competitors is its simplicity (it does not involve 
evaluation of a ratio, with a denominator that can be small), which makes it 
easy to implement. 

Our goal in this section is to investigate some consistency properties of the 
ABC-companion estimate (13.11) . Pointwise mean square error consistency is 
proved in Theorem 13.31 and mean integrated square error consistency is estab- 
lished in Theorem 13.41 We stress that this part of the document is concerned 
with minimal conditions of convergence. We did indeed try to reduce as much 
as possible the assumptions on the various unknown probability densities by 
resorting to real analysis arguments. 

The following assumptions on the kernel will be needed throughout the paper. 



Assumption [Kl] The kernel K is nonnegative and belongs to L 1 (M P ), 
with f Rp K{6)&6 = 1. Moreover, the function sup^n^^n |A^(y)|, 6 € W p , is 
in L^RP). 

Assumption set [Kl] is in no way restrictive and is satisfied by all standard 
kernels such as, for example, the naive kernel 

k(o) = ii^ (0il) (e), 

where V p is the volume of the closed unit ball B p (0, 1) in W, or the Gaussian 
kernel 

^)=(2^«pHW/2). 
We recall for further references that, in the p-dimensional Euclidean space, 
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where T(.) is the gamma function. Everywhere in the document, we denote 
by X p (respectively, X m ) the Lebesgue measure on MP (respectively, M m ) and 
set, for any positive h, 

K h (0) = ±- p K(0/h), 6 G W. 
We note once and for all that, under Assumption [Kl], 

K h (6)d6 = 1. 



The first crucial result from real analysis that is needed here is the so-called 
Lebesgue's differentiation theorem (see, e.g., Wheeden and Zygmund [HI 
Theorem 7.16]), which asserts that if tp is a locally integrable function in M n , 
then 

— — / |^(x)-^(x o )|dx->0 as 5^0+ 

V n<J JB n {x Q ,8) 

for A n -almost all x G M n . A point x at which this statement is valid is called 
a Lebesgue point of <p. In the proofs, we shall in fact need some convolution- 
type variations around the Lebesgue's theorem regarding the prior density 
ii. These important results are gathered in the next theorem, whose proof 
can be found in Stein [121 Theorem 1, page 5 and Theorem 2, pages 62-63]. 

Theorem 3.1 Let K be a kernel satisfying assumption [Kl], and let the 

function n* be defined on M p by 

O ^ 7T*(0 O ) = sup f K h (0 - G)n{d)dG . 
(i) For X p -almost all 6q G M. p , one has 

/ K h (0 o - 0)vr(0)d0 -> tt(0 o ) as h ^ + . 

JRP 

(ii) The quantity ti*(0q) is finite for X p -almost all 0q G MP. 
(Hi) For any q > 1, the function n* is in L q (M p ) whenever tt is in L q (MP). 



When K is chosen to be the naive kernel, the function tc* of Theorem 13.11 is 
called the Hardy-Littlewood maximal function of tt. It should be understood 
as a gauge of the size of the averages of 7r around . 

We shall also need an equivalent of Theorem 13. II for the joint density /, which 
this time is defined on M p x M m . Things turn out to be slightly more com- 
plicated in this case if one is willing pairs of points (0 , s ) to be approached 
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as (h, S) — >• (0 + , + ) by general product kernels over MP x M m . These kernels 
take the form Kh(-) <S> L$(.), without any restriction on the joint behavior 
of h and 5 (in particular, we do not impose that h = 5). The so-called 
Jessen-Marcinkiewicz-Zygmund theorem [21] (see also Zygmund (HJ Chap- 
ter 17, pages 305-309]) answers the question for naive kernels, at the price 
of a slight integrability assumption on /. On the other hand, the litera- 
ture offers surprisingly little help for general kernels, with the exception of 
arguments presented in Devroye and Krzyzak [9]. This is astonishing since 
this real analysis issue is at the basis of pointwise convergence properties 
of multivariate kernel estimates and indeed most density estimates. To fill 
the gap, we begin with the following theorem, which is tailored to our ABC 
context (that is, when the second kernel L is restricted to be the naive one). 
A more sophisticated result (that is, for both K and L general kernels) to- 
gether with interesting new results on convolution and approximation of the 
identity are given in the Appendix section, at the end of the paper. In the 
sequel, notation u + means max(u, 0). 

Theorem 3.2 Let K be a kernel satisfying assumption [Kl], and let the 
function f* be defined on MP x M m by 



(0 O , s ) ^ /*(0 o ,s o ) = sup 

h>0,8>0 



V-mO Jup JB m (s ,S) 



K h (0 o -0)f(0,s)d0ds 



f{e, s) log + /(0, s)d0ds < oo (3.3) 



then, for X p <g> \ m - almost all (0 O , s ) 6l p x M m , 

— *— f [ K h (e - e)f(e, s )d0ds ^ f(e , Bo ), 

V m J RP JB m (s ,S) 

provided (h,8) — > (0 + ,0 + ). 

(ii) If condition A3. 6 J\) is satisfied, then f*{Oo, Sq) is finite for \ p ®\ m - almost 
all (0 o ,s o ) G W x M m . 

(Hi) For any q > 1, the function f* is in L q (MP x R m ) whenever f is in 
LHM p xM m ). 



A remarkable feature of Theorem 13.21 (i) is that the result is true as soon as 
(h,S) — > (0 + ,0 + ), without any restriction on these parameters. This comes 
however at the price of the mild integrability assumption (13.31) . which is true, 
in particular, if / is in any L g (M p x R m ), q > 1. 
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Recall that we denote by / the marginal density of f(0, s) in s, that is 

/(s)= f /(0,s)d0, seR m . 
Jrp 

We are now in a position to state the two main results of this section. 

Theorem 3.3 (Pointwise mean square error consistency) Assume th- 
at the kernel K is bounded and satisfies assumption [Kl]. Assume, in addi- 
tion, that the joint probability density f is such that 



[ [ /(0,s)log + /(0,s)d0ds<oo. 

Jrp Jr™ 



Then, for X p <g) X m -almost all (0 O , s ) G W x R m ; with f(s ) > 0, if k N — > oo, 
k^/N — > 0, hjsr — > and kNh p N — >■ oo, 

E[g N (0 o ) - g(0 o \s o )] 2 ^0 as N ^ oo. 

It is stressed that the integral assumption required on / is mild. It is for 
example satisfied whenever / is bounded from above or whenever / belongs 
to L q (R p x ]R m ) with q > 1. There are, however, situations where this 
assumption is not satisfied. As an illustration, take p = m = 1 and let 

r=j(0,s)eKxR:0>O,s>O,0 + s<~|. 



Clearly, 



Choose 



-d0ds < oo. 



r 



(0 + s) 2 log 2 (0 + s) 



c 

mS)= (0 + s)Mog 2 (0 + s) WT1 ' 

where C is a normalizing constant ensuring that / is a probability density. 
Then 



whereas 



/ / /(0,s)d0ds = l 

JRP JR m 

[ [ /(0,s)log + /(0,s)d0ds = oo. 

Jrp Jr™ 



Theorem 13.41 below states that the estimate is also consistent with respect 
to the mean integrated square error criterion. 
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Theorem 3.4 (Mean integrated square error consistency) Assume th- 
at the kernel K belongs to L 2 (MP) and satisfies assumption [Kl]. Assume, in 
addition, that the joint probability density f and the prior it are in L 2 (M. P x 
R m ) andL 2 (W), respectively. Then, j "or A m - almost all s £ M. m , with /(s ) > 
; if k,N — > oo, kx/N — y 0, /ijv — ^ and kNh p N — > oo, 



Here again, the regularity assumptions required on / and it are minimal. 
One could envisage an additional degree of smoothing in the estimate (13.1 p 
by observing that taking the k^ nearest neighbors of s can be viewed as the 
uniform kernel case of the more general quantity 



which allows unequal weights to be given to the Sj's. The corresponding 
smoothed conditional density estimate is defined by 



Thus, cjn is the uniform kernel case of g^. The asymptotic properties of g^, 
which are beyond the scope of the present article, will be explored elsewhere 
by the authors. A good starting point are the papers by Moore and Yackel 
[30j E] and Mack and Rosenblatt (25], who study various properties of similar 
kernel-type nearest neighbor procedures for density estimation. 

4 Rates of convergence 

In this section, we go one step further in the analysis of the ABC-companion 
estimate g^ by studying its mean integrated square error rates of conver- 
gence. We follow the notation of Section 3 and try to keep the assumptions 
on unknown mathematical objects as mild as possible. Introduce the multi- 
index notation 



for (3 = . . . , (3 n ) G N n and x £ R n . If all the fc-order derivatives of some 
function if : MJ 1 — > K. are continuous at xq G M n then, by Schwarz's theorem, 




as N — > oo. 





\P\ = p x + . . . + (3 n , p\ = p x \... 0J, ^ = x^...x 
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one can change the order of mixed derivatives at x , so the notation 

for the higher-order partial derivatives is justified in this situation. 

In the sequel, we shall need the following sets of assumptions. Recall that 
the collection of all s G M m with j B ^ ^ /(s)ds > for all 5 > is called 

the support of /. 

Assumption [Al] The marginal probability density / has compact sup- 
port with diameter L > and is three times continuously differentiable. 

Assumption [A2] The joint probability density / isinL 2 (R p xlR m ). More- 
over, for fixed s , the functions 

d 2 f(0o, s ) 



d 2 f(8o,s ) 



and 6 >->■ ^-^ , 1 < j < m 



3 

are defined and belong to L 2 {W). 

Assumption [A3] The joint probability density / is three times continu- 
ously differentiable on W x R m and, for any multi-index /3 satisfying \/3\ =3, 



sup / [D p f(0,s)] 2 de < oo. 



It is also necessary to put some mild additional restrictions on the kernel. 

Assumption [K2] The kernel K is symmetric and belongs to L 2 (R P ). 
Moreover, for any multi-index (3 satisfying \(3\ G {1,2,3}, 

/ \0P\K(0)d0 < oo. 



We finally define 

£o = inf — / /(s)ds, 

0<S<L 5™ J Bm{ ^ 6) 
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and set 

D m (k N ) 



in fk N + i\ 2/m i: 1 "' k N + i 



e 2/m (m-2) \N + lJ Um/2-1) N + l' 

m fk N + l\ A/m L 4 ~ m k N + l 



e 4/m (m-4) \N + lJ (o(m/4-l) N + l' 

\{u \ 1 fi i l (t t aN+1\\ k N + l 
A(A;jv) = — 1 + log [£ Q L 



6 V V + 1 J J N + 1 

The next theorem makes precise the mean integrated square error rates of 
convergence of gjv(.) towards g(.|s ). 

Theorem 4.1 Let K be a kernel satisfying assumptions [Kl] and [K2]. Let 

s be a Lebesgue point of J such that /(s ) > 0. Assume that Assumptions 
[A1]-[A3] are satisfied. Then, letting 

p 



d 2 f(0 ,s ) 
, d9 h d9 i2 

H,»2 = l 

, , fl v 1 f9 2 /(0 Ol S„) 



2m + 4 ^ 9s? 

d 2 f(s ) 
2m + 4 ^ ds? ' 



and 



$i(so) = 7^ / 0?(0o,s o )d0 o , 

$ 2 (S ) = 7 J^- / [0 2 (0o,So)/>o)-03(So)/(^O,S O )] 2 d0 o , 

$3(s ) = -^-y y 0i(6> o ,s o ) [0 2 (0 o ,So)/(s o ) -0 3 (so)/(0o,s o )] d0 



one /ias 
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1. For m = 2. 



E 



$i(s )/4 + $ 2 (s )A 2 (/cjv) + $ 3 (so)h 2 N D(k N ) + 



X (1+0(1)). 



k N h 



Nl L N 



2. For rn = 4. 



E 



[&v(0o)-0(0o|s o )] 2 d0 o 



$i(s )/4 + $ 2 (s )A(fcjv) + $ 3 {s )h 2 N D A (k N ) + 



X (1+0(1)). 



k N h p N 



3. For m £ {2, 4} ; 



E 



[^(0 o )-^(0 o |s o )] 2 d0 c 



^i(so)^ + $ 2 (s )A m (A;jv) + $ 3 (s )/i^D m (A; J v) + 



x (l+o(l)) 



k N h p N 



By balancing the terms in Theorem 14. 1| we are led to the following useful 
corollary. 

Corollary 4.1 (Rates of convergence) Under the conditions of Theorem 
4-l\ one has 



1. For m G {1, 2, 3} ; there exists a sequence {k^} with k^ oc Np+ s and 
a sequence {h^} with oc N~v+& such that 



E 



[^v(6/ o )-2(0o|so)] 2 d0 o 
ftr^ 1 ^ ] + $ 2 (s ) + / K 2 (0)d0) N-^+o (n~M 
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P+4 



2. For m = 4, there exists a sequence {k^} with k^ oc N?+ s and a se- 
quence {h N } with oc N~p+s such that 



E 



[g N (0 o )-g(0 Q \s o )} 2 d0 o 
4$i(s x 



Up + K 



N p+8 log AT + o AT p + 8 log AT 



P+4 



5. For m > 4, i/iere exzsis a sequence {k N } with k N oc A^ m +p+ 4 and a 
sequence {h^} with oc A^ _m +p+ 4 suc/i £/iat 



E 



[^(0o)-^(^o|so)] 2 d0 o 
m<l>i(so 



Ao /m (™-±) 

+ O ( V ~ m+p+4 



+ $ 2 (s ) + - + 



e 2/m (m - 2) 



K 2 (0)dO N~^+t+* 



Several important remarks are in order. 

1. The distinction between the cases m G {1,2,3}, m = 4 and m > 
4 may seem unnatural at first sight. However, such low-dimensional 
phenomena are also known to hold for the classical fc-nearest neighbor 
regression function estimate, which does not achieve the optimal rates 
in dimensions 1 and 2 (see, e.g., Problems 6.1 and 6.7 in Gyorfi et 
al. [US Chapter 3]). 

2. From a practical perspective, the fundamental problem is that of the 
joint choice of k^ and in the absence of a priori information regard- 
ing the posterior g(.|s ). Various bandwidth selection rules for condi- 
tional density estimates have been proposed in the literature (see, e.g., 
Bashtannyk and Hyndman [TJ, Hall et al. [IB], Fan and Yim [TO])- How- 
ever most if not all of these procedures pertain to kernel-type estimates 
and are difficult to adapt to our nearest-neighbor setting. Moreover, 
they are tailored to global statistical performance criteria, whereas the 
problem we are facing is local since s is held fixed. Devising a good 
methodology to automatically select both parameters k^ and in 
function of So necessitates a specific analysis, which we believe is be- 
yond the scope of the present paper. 
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3. Nevertheless, Corollary 14 . 1 1 provides a useful insight into the proportion 
of simulated values which should be accepted by the algorithm. For 
example, for m > 4, a rough rule of thumb is obtained by taking 
k N rj N(p+4)/(m+ P +4) ? so that a f rac ti on Q f about k N /N rj jV- m /( m +P+ 4 ) 

ABC-simulations should not be rejected. 



5 Proofs 



5.1 Proof of Theorem 12.1 



Throughout the proof, we let E be the permutation of {1, . . . , N} such that 
Ss(i) is the zth nearest neighbor of s for all %. We note once and for all 
that £ is a random variable uniformly picked in the set of all permutations 
of {1, ...,N}, since the pairs (0j,Sj), i = 1,...,N, are independent and 
identically distributed. To lighten the notation, we suppress the index N 
in kjy and write k instead. We let as well C(N, k) be the ^-combinations of 
{1, . . . , N}, with cardinality f 1 ?) . 

Denote by (0i, Si), . . . , S^) independent and identically distributed ran- 
dom variables, with common probability density 

l[||s-s o ||<<i (fe+13 ]/(0, s), (5.1) 

where the normalizing constant Cd (k+1) is defined by 



c d(k+1) = / / f{e,s)dBdB. 

JRP JB m (s ,d (k+1) ) 



Note, since Sq belongs by assumption to the support of /, that the constant 
C d{k+1) is positive. 

To prove the first statement of the theorem, it is enough to establish that, 
for any test functions $ and (p, with $ symmetric in its arguments, one has 

E [$ ((0 (1) , S (1) ), . . . , (0 (fc) , S (fc) )) ^(d (fc+ i))] 



E 



$((0i,Si),...,(0 fc ,S fc ))^(d ( 



k+i)j 



To this aim, first observe, since $ is symmetric, that 



E [$ ((©(!), S ( i)), . . . , (0 (fc) , S (fc) )) <p(d (k+1) )] 

= E [$ ((©s(i), S s( i)), . . . , (0 S (fc), S S (fc))) ^(cZ E (fe+i))] • 
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Therefore, 

E[$((e (1) ,s (1) ),...,(e (fc)) s (fc) ))^ (jk+ i))] 

]T e[$ ((© ffl , s CT1 ), . . . , (e CTfc , s CT j) ^(rf s(fc+1) ) 



{<n,...,<T fc }6C(iV,fe) 



X 1 



[{E(l),...,E(k)}={oi,„.,o*}] 



E 



$ ((0i, Si), . . . , (0 fc , S fe )) ^(rf S (fe+i)) 



x 1 



[{E(l),...,S(fc)}={l,..,fe}] 



(5.2) 



In the last equality, we used the fact that all orderings have the same prob- 
ability. Next, observe that, with probability 1, 



-[{E(l),...,S(fe)}={l,...,fc}] 



N 

£ 

?=fc+i 



^ k N 
. i=l h=k+l 



\ 



(5.3) 



Thus, using identity (15.31) . we are led to 

E [$ ((©!, Si), . . . , (0 fc , S fe )) ^(^S(fe+l))l[{E(l),...,E(fc)}={l,...,fc}]] 



iV 



fc AT 

£ E [$ ((0i, SO, ... , (0 fc , S*)) J] 1 

?=fe+l i=l h=fc+l 

AT fc 

[dj<d e ] | «A;+1, 



f A 



£=fc+l 



A 



*w n 



h=fc+l 



(5.4) 



By exploiting the independence of the pairs (0j, Sj), % = 1, . . . , N, we may 
write 



E 



$ ((01, Si), ... , (0 fc , Sfc)) Y\ l[dj<4<] | 4+1, • • • , ^Af 



E 



$((0i,si),...,(0 fc ,Sfc))ni^<^i^ 
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that is 
E 



$ ((01, Si), ... , (0 fc , Sfc)) Yl Md^de] | 4+1, • • • , d 

k 



N 



E 



$ (0i,Si),...,(0 fc ,S fc ) Id 



x E 



II 1 

.i=i 



[di<(if] | 4 



(5.5) 



where the random variables (0i, Si), . . . , (0^, S^) are independent and iden- 
tically distributed, with common density ( 15. ip . Specifically, 



CI = E 



II 1 [dj<d l ] 

b=i 




/(0,s)d0ds 



Rf JB m (s ,de) 



By Doob's lemma, there exists a (deterministic) measurable function G such 
that 



E 



G{d t ). 



$(^(0i,Si),...,(0 fc ,S fc ) 
Thus, combining (15 .4p and (15.51) . we obtain 

E [<& ((0i, Si), . . . , (0 fc , S fe )) y(ds(fc+i))l[{E(i),...,E(fc)}={i,...,fc}]] 



2V 



2V 



£ E\G(d £ Md £ )l[l [d]<de] x j ] 1 



[dh>ck] 



£=k+l j=l 

= E [G(<is(fc+i))v 9 (4(fc+i))l[{s(i),...,s(fc)}={i,...,fe}]] • 
Finally, plugging this equality into (15. 2p . we obtain 
E [$ ((0(i), S ( i)), . . . , (0 (fc) , S( fc ))) 

k 



E [G(d S (fc+i))^((is(A ; +i))l[{s(i),...,s(fc)}={i,...,fc}]] 



{a 1 ,...,a k }eC(N,k) 



$ (0 CT1 ,S 



CTi y ? • • • ; 



X 1 



[{E(l),...,E(fc)}={«l,...,«r k }] 



where (0i, Si), . . . , (0jv, Sjv) are independent and identically distributed, 
with common density (15. ip . 
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Consequently, 

e [$ ((e (1) , s (1) ), . . . , (0 (fe) , s w )) ¥»(d (fc+ i))] 

= E $ ((©s(i), S 2 (i)), . . . , (0E(fc), S E ( fe ))J (p(dv( k+1 )) 

= E $ n©(i), S ( i)), . . . , (0 (fc) , S( fe )) ) y»(d(fc+i)) 

= E [$ ((€>!, S x ), ... , (0 fc , S fc )) 
(since $ is symmetric). 

This concludes the proof of the first part of Theorem 12.11 

To prove the second statement, it suffices to establish that, for any test 
functions $ and (p (with $ not necessarily symmetric), one has 

E[$ ((0 (1) , S ( i)), . . . , (e (fc) , S (fc) )) ^(d(fe+i))] 

= E[$ f(0 (1) , S (1) ), . . . , (0 (fc) , S (fc) )) v(d (fe+ i))]. 



The arguments may be repeated mutatis mutandis by replacing the fc-com- 
binations of {1,...,N} by the fc-permutations V(N,k) (with cardinality 
N\/(N — k)\), and replacing identity f )5.3p by 



-[(S(l),...,S(fc))=(l,...,fe)] 



Details are omitted. 



N 

E 

e=fc+i 



iV 



-[di<...«i fc «y x l[d fc </,] 

h=k+l 



5.2 Proof of Theorem [33 



The proof will strongly rely on Theorem 12.11 It is assumed throughout that 
s is a Lebesgue point of / (A m -almost all points satisfy this requirement) 
such that /(so) > 0. We note that this forces s to belong to the support 
of /, so that the assumption of Theorem 12.11 is valid. The collection of valid 
s will vary during the proof, but only on subsets of Lebesgue measure 0. 
Similarly, we fix G MP, up to subsets of Lebesgue measure which will 
appear in the proof. 



First observe that, according to Theorem 12. 11 



E[g N (0 o ) I d 



(fcw+l)J 



a 



K hN (o -e 




/(0,s)ds d0, 



B m (so,d(k N +i)) 
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where, for any 5 > 0, 

C s = [ /(s)ds. 

JB m (so,6) 

Put differently, by Fubini's theorem, 



E[<?jv(0 o ) | d{k N +i)} 



a, 




K hN (0 -0)f{0, S )dOds. 

(5.6) 



/d (fcjV + !) " /RP ■ ,B "»( s 0)<*(k Ar +l)) 

The proof starts with the variance-bias decomposition 

E &v(0o) - ^(6> )] 2 = E [E [{g N (8 ) - E[g N {G ) \ d (kN+1) }) 2 \ d (kN+1) 

+ E[E[g N (e o )\d {kN+1) }-g(0 o )} 2 . 



(5.7) 



Our goal is to show that, under our assumptions, both terms on the right- 
hand side of (15. 7p tend to as N — > oo. We start with the analysis of the 
second one, by noting that 



|E[<7jv(0o) | d {kN+1 
1 



a 



d (k N + l) 




g{o )\ 

K hN (e - e)f(o, S )d0ds - ffi^l 

Bm(so,d( fcjv + l)) J{ S 0) 



where we used (15.61) and the definition of g(8 ). Equivalently, 



|E[(hv(0o) I d { k N +i)} ~ g(0 Q ) 

Vm(1 (k N +l) 1 



0d (fciv+i) Vm(1 (k N +l) 

/(0o,s o ) 




K hN (6 Q -6)f(6,s)d6ds 



B m (B0,d{k N +X)) 



/(so) 



For a fixed pair (#o, So) and all /i, <5 > 0, set 

1 



C s V m 6« 




K h (8 - 8) f (8, s)d0ds 



f(0o,so) 



/(so) 



According to technical Lemma lSTTU i). the quantity V m 5 m /Cs tends to l//(s ) 
as 5 — > + . Therefore, by the first statement of Theorem 13. 2^ we deduce that 

Cs (M)^0 as(M)^(0 + ,0+), 

this being true for X p ® A m -almost all pairs (0 O > So) GR p x M m . 
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Next, introduce 7r* (respectively, /*), the maximal function defined in Theo- 
rem [XT] (respectively, Theorem 13.21) . Take any 5 > 0. On the one hand, by 
the very definition of /*, 



sup [ip0 o , so {h, 5)) < sup 

/i>0,<5 ><5>0 0<6<6 



t(0 ,s ) + 



/(gp.so) 
/(so) 



On the other hand, for 5 > 5 , 

1 



ipe , S0 (h,8) < 



a 



K h {0 - 0)7r(0)d0 



f(0 ,s ) 



so that 



<5o Jrp f\ s o) 
sup [ipg 0tS0 {h, 5)} < — + . 

/i>0,<5><5 L'So A s oJ 



Thus, putting all the pieces together, we infer that for A p ® A m -almost all 
pairs (0 O , s ) 6» p x K m , 



sup [V>e ,s (h,S)} < sup 

/i>0,5>0 0<<5<<5o 



m , So) + ^ + !^2). (5 . 8) 



c, 



5o 



/(so 



In consequence, by Lemma [6.11 (ii). Theorem 13.11 (ii) and Theorem 13.21 (ii). 
for such pairs (0 o ,s o ), 



SU P [^o,s ( /l ' 5 )] < °°- 
fe>0,5>0 



(5.9) 



Now, since d^ N +i) — > with probability 1 whenever k N /N — > (see, e.g., 
Devroye et al. [SJ Lemma 5.1]), we conclude by Lebesgue's dominated con- 
vergence theorem that the bias term in (15.71) tends to as N — > oo. 

To finish the proof, it remains to show that the first term of (15.71) vanishes 
as iV — > oo. This is easier. Just note that, using again Theorem 12. 1\ 



E 



(g N {6 Q ) - E[g N (0 o ) | d 
1 1 



K 



(k N +l)l) | ' 

0o-O 



(fcjv+l) 



h N 

2 



f(0, s)ds ]d0 



B m (so,d(k N +i)) 



— (E [g N (d )\d (kN+1) ]) . 



(5.10) 
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Hence, if K is bounded by, say, \\K\\ C 



E 



{9n(0q) - E[g N (0 o ) | d (fejv+ i)]) 2 | d 



(fcjv+l) 



< 



< 



1 



k N h 2 £ C d{kN+1) 

1 Moo 



9a -e 



h 



N 



f(0, s)ds )d0 




k N h P N C d(kN+1) JRP JB m (s ,d(k N +i)) 

Thus, using (15. 9p . we obtain 



B m (s ,d( fcjv+ i)) 

K hN (0 o -0)f(O,s)dOds. 



E 



{9n{0q) ~ H9n{Oo) I d(k N +i)]) 2 | d 



(fejV + l) 



< 



C 



k N h p N 



for some positive constant C depending on , s and K, but independent 
of fix and /cat. This shows that the variance term goes to as k^h p N — > oo 
and concludes the proof of the theorem. 



5.3 Proof of Theorem 13.4 

We start as in the proof of Theorem 13.31 and write, using Fubini's theorem, 



E 



[9n(0o) - 9(0 O )} 2 d0 



E 



+ E 



E 



{9n{Qo) - E[(7Ar(0o) I ^(fejv + 1)]) 2 I ^(fejv+l) 

[E[g N (0 )\d {kN+l) ]-g(0 )] 2 d0 



dOr 



(5.11) 



It has already been seen that 



E 



(g N (0 ) - E[g N (0 ) | d 



(fejv+i)]) I d (k N +i) 



< 



1 1 




k N h$ C d{kN+1) J RP JB m (s ,d[k N +i)) 



K 



2 / u 



e n -e 



'■N 



/(0,s)d0ds. 



Consequently, by definition of Cd (fe we are led to 



E 



(9n(Oo) ~ ^[9n{0 ) I d {kN+1) ]) 2 | d 



(fcjv+l) 



d6 < 



k P K\e)de 



k N h p N 



This shows that the first term in (15. lip tends to as k^h p N — )■ oo. 



(5.12) 
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Let us now turn to the analysis of the bias term. With the notation of the 
proof of Theorem 13.31 we may write 



E 



[E[g N (6 ) | d 



(fcjV + 1) 



g(0 )] d0 o 



E 



^00,80 (h N ,d( kN+ i))d0 



It is known from the proof of Theorem 13.31 that the limit of ipg s (^> ^) ^ s 
for X p g> A m -almost all (0 O , s ) eW x R m , whenever (h, 5) ->■ (0+, 0+). Take 
any 5 > 0. Denoting by /* (respectively, 7r*) the maximal function defined 
in Theorem 13.21 (respectively, Theorem 13.11) , we also know (inequality (|5.8p ) 
that 



sup [if>0 o ,8o{h,5)] < sup 

h>0,8>0 0<8<8 



v m s r 



Thus, because (a + b + cf < 3a 2 + 3b 2 



h>0,5>0 



< 3 ( sup 

VO<<5<<5 



f*(e ,so) 

f 3c 2 , 



7T*(0 O ) 



ir*(0 o ) , 2/(0 o , s ) 



a 



5o 



/(so) 



a, 



So 



12 



/(0q,s c 
/(so) 



By Lemma IBTTl (ii), the supremum on the right-hand side is bounded. More- 
over, by assumption, / is in L 2 (R P x R m ). Therefore the function 6 y 
f(6 ,s Q ) is in L 2 (R P ) as well for A m -almost all s G R m . Similarly, for Am- 
almost all s , by Theorem 13.21 (in), the function O (->■ f*(0 , s ) is in L 2 (R P ). 
Finally, n* belongs to L 2 (W) by Theorem 13.11 (Hi). Since d^ N +\) —> with 
probability 1 whenever k^/N — > 0, the conclusion follows from Lebesgue's 
dominated convergence theorem. 



5.4 Proof of Theorem 14.11 

Throughout the proof, it is assumed that the Lebesgue point s is fixed and 
such that /(sq) > 0. This forces Sq to belong to the support of /. 



As in the proofs of Theorem 13.31 and Theorem 13 A\ we set, for any 6q 6 W 
and all h, 8 > 0, 



ipe , S0 {h,5) 
where 



V m 8 r 



Ck V m 5 r < 



K h (0 -0)f(0,s)dOds-£B^ 

B m (s ,5) J{ s 0) 



/(s)ds. 



Bm(so,<5) 



26 



With this notation, it is readily seen from identity identity (15. lip and identity 
floTTUj) that 



E 



[g N (0 ) - g(6 \s )] 2 d0 o 



E 



V , 0o,so(^' C W+ 1 )) d ^O 



k N h p N 



1 



k N 



(E [g N (0 o )\d {kN+1) }) 2 de Q 



Recall that 
E[&v(0o) I d (kN+1) ] 



a 



d (k N + l) 



K hN (0 -0) 



m(s ,d(k N + l)) 



/(0,s)ds \d0, 



and the same arguments as in the proof of Theorem 13.41 reveal that 



sup 

h N >Q,L>d {kN+1) >0 



(E[g N (0 o )\d (kN+1) }) 2 < ( 



sup 

\0<5<L 



Since / is in L 2 (R P x R m ) by Assumption [A2], this ensures that for A r 
almost all s G M m , 



E 



and 



k 



-E 



JV 



In particular, 



k 



-E 



N 



(E[g N (0 o )\d {kN+1) ]) d0 
(E [g N (8 )\d {kN+1) ]) 2 d6 

(E[g N (0 )\d {kN+1) ]) 2 d0 



< oo 



O 



1 

k N 



k^h p N 



The rest of the proof is devoted to the study of the rate of convergence to 
of the quantity 

^0 o ,s o (^V,d(fcjv+l))d0O • 



E 

By an elementary change of variables, using the symmetry of K, 
1 



V 5 m 




K h (e -e)f(e lS )deds 



B m (so,S) 



Vrr. 




K(6)f(0 o + h6, s + <5s)d0ds. 



m Jrp JB m (0,l) 
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Next, by the multivariate Taylor's theorem applied to / around (6 . 
(which is valid here by Assumption [A3]), 

f(0 o + hO, s + 5s) =/(0 o , s ) + J2 DP f(°°> s ^ h6 ' 5s ^ 

1/81=1 
l/»l=2 P ' 



+ J] ify(0 o + hO, s + 5s)(fc0, 5s) 



/ 1 ai/u -r <«/. on - i'o n /'<y. i/o 1 

1/31=3 

where each component of the remainder term takes the form 



Rp(e + he,s + 5s) = |r / (l-t) 2 D*/(0 o +t/i0,so + fcfe)dt. 
In view of the symmetry of K and the ball B m (0, 1), it is clear that 

K{0) D^f(e ,s )(he,8sfdeds = 0. 

Bm (0,1) |(9 [ =1 

Similarly, elementary calculations reveal that 

D?f(e , So 



W f K (e)J2 D f{e °' So) (headed* 

m JRP JB m (q,l) w =2 P- 

^ 1 (6>o,so)/i 2 + 2 (6»o,s o )5 2 , 



with 



and <* 2 (0 O , so) = — ^ fe 2 / . . 'id* 



~ ^B m (0,l) 

Using expression (13. 2p of V m , an elementary verification shows that 

sAs 



Vm JB m (o,i) 3 m + 2 
Thus, we see that 

d 2 /(0 o ,s o ) 



>2 



2m + 4 4-* 9s 2 
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Let us now define (h.,6) = (h, . . . , h, 8, . . . , 5) (where h is replicated p times 
and 5 is replicated m times) and care about the remainder term Rp(Oo + 
h6, Sq + 5s). For any multi- index (3 with \/3\ = 3, it holds 



K{0)Rp{6 + hO, s + 6s){h0, tfs^dflds = (h, 6) P A P {0 O , h, 5) 

Rp JB m (Q,l) 

where, by definition, 

A p (0 ,h,6)= [ [ K{0)Rp{0 Q + hO, s + 5s)(0, sfdflds. 

J MP JBm(0,l) 

[Note that Ap(0 , h, 5) depends in fact upon s as well, but since this depen- 
dency is not crucial, we leave it out in the notation.] Finally, 

K h (0 -0)f(6,s)d0ds 



T/ Am , 

V m Jup JB m (s ,8) 



f(0 , s ) + MOo, *o)h 2 + MOo, s )5 2 + Yl ( h ' *)%(*o, h, 5). 



Considering now the function 

^ w = = yW f^ ds = t! /(s ° + 5s)ds ' 

V m V m JB m (s ,S) V m JBm(0,l) 

and the asymptotic expansion of 1/t So around 0, a similar analysis shows 
that 

Vm6 m 1 _ 3 (SO) , 2 , x3 

C S /(So) / 2 (s ) 

where 



-»3 



2m + 4 -f-f 9s? 

3=1 J 



and, with a slight abuse of notation, there exists t G (0, 1) such that 

H(t5) 



Ci(*) 



In this last expression, the function H depends only on the successive deriva- 
tives D^J{sq + t5s) for < \f3\ < 3 and is therefore bounded thanks to 
Assumption [Al]. Besides, by the very definition of £o an d technical Lemma 
EM 

T*o(t5) = — i=w / /(s)ds > |L > 0. 

V m (t5) m J Bm ( S0 ,t8) V m 
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Thus, the function Ci(^) is sucn that 

sup Ci(^) < 00 • 

0<5<L 

Putting all the pieces together, we conclude that 

ipe , So (K S) = \<f>i(Po, *o)h 2 + fa(0 , s )5 2 + h 2 ( 2 (0 , h, 5) + 5 2 ( 3 (Oo, h,S)\, 
where 

a fa \ M a o,s ) A ± (Q \ M a o,s )f(s ) - 3 (s o )/(6> o , s ) 

M o,So)= J, \ and 05 ^0, s = v • 

/(so) / 2 (s ) 

Moreover, one can check, using Assumption [A2] and the second statement 
of Assumption [A3] together with technical Lemma [6.2[ that for i = 2, 3, 

O(0 O , M)^0 as(M)->(0 + ,0 + ), 

and 

sup / ( 2 (0 o ,h,5)d0 o < 00 

0<fi<A/,0<<5<L Jrp 

for all positive M. As a consequence, 

<, So (M)d0 O 

= ^(s )h 4 + $ 2 (s )5 4 + $ 3 (s )^ 2 + {h 2 + S 2 ) 2 Uh, 6) 

where 

/ 2 (So) Jrp 

$ 2(S ) = 7^- / [0 2 (0o,S O )/>o)-03(So)/(0O,S O )] 2 d0 o , 

J \ s o) Jrp 

$s(so) = TT7 c / 01 (0o, s ) [0 2 (0o,s o )/(so) -0 3 (so)/(0o,s o )] d0 o . 
Besides, for all positive M, 

sup C4(^-, 5) < 00 (5.13) 

0</t<Af,0<<5<L 

and 

( 4 (0 ,h,5) as (M) -»■ (0 + ,0 + ). (5.14) 
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Finally, 



E 



[g N (0 ) - g(6 \s )] 2 d0 o 



*x(so)h% + $ 2 (s )E[4 Jv+1) ] + $ 3 (s Q )^E[4 JV+1) ] + 



+ E [(h% + d\ kN+l) ) 2 C,i(h N , d {kN+ i))] + o 



k N h 



k N h p N 



The conclusion is then an immediate consequence of (15.1 31) - (15.141) and As- 
sumption [Al], together with Proposition 16.11 and Proposition 16.21 which 
respectively provide upper bounds on E[d? fc ^ +1 J and E[dl +1 J depending 
on the dimension m. 



6 Some technical results 

Lemma 6.1 Let Sq G lR m be a Lebesgue point of f such that /(so) > 0. For 
any 5 > 0, let 



(i) One has 



C s = I /(s)ds. 

'B m (s ,<5) 



V m S m 1 

-> — — - as 5 ->■ 



Cs /(s ) 
(ii) One has, for any 5 > 0, 

~V m 5 

sup 

0<S<S 



< oo. 



Proof of Lemma 16.11 The first statement is an immediate consequence 
of Lebesgue's differentiation theorem (Wheeden and Zygmund [I7J Theorem 
7.2]). 

Take now 5 > 0. Since /(s ) > 0, it is routine to verify that the mapping 



S -> 



Cs 



is positive and continuous on (0, So]. Thus, by (i), we deduce that 



sup 

0<<5<<5 



< oo. 
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Lemma 6.2 Assume that the joint probability density f is three times con- 
tinuously differentiate on M. p x W m and let be a multi-index satisfying 
|/3 1 = 3. Assume that 

sup / [£^/(0,s)] 2 d0< oo, 

and, for h, 5 > 0, consider the parameterized mapping O H- Ap(0 , h, 5), 
where 

Ap(0 o ,h,5)= I I K{0)Rp{0 + hO, s + 5s) (0, s^dflds, 

Jrp JB m (Od) 



with 



Then 



Rp(0 + hO, s + 5s) = I (1 - t)D p f(e + the, s + t5s)dt. 

Jo 

sup / A 2 p(6 , h, 5)d6 Q < oo. 

h,S>0 Jrp 



Proof of Lemma 16.21 The proof relies on an application of the generalized 
Minkowski's inequality (see, e.g., Hardy et al. [18, Theorem 202]). Indeed, 



< 



where 



Letting 



we obtain 



A}(0 ,h,5)d0 
II I £j /2 (0,s,t)(l -t)K{6) |(0,s) /3 |d0dsdt, 

Jrp JB m (P,l) JO 

Ep(0,s,t)= I [DPf(0 + thO,s + t5s)] 2 d0 . 

Jrp 

C 2 =sup / [D p f(0, s)] 2 dO <oo, 
seiR m Jrp 



(I Al(0 o ,h,5)d0 o Y < C I I I (l-t)K(0) |(6>,s) /3 |d0dsdt. 

\Jrp J Jrp JB m (p,l) Jo 

This upper bound is finite thanks to assumption [K2], and independent of h 
and 5. ■ 
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Lemma 6.3 Let s be a Lebesgue point of f such that /(s ) > 0. Then, for 
all positive L, 

0< inf — / f(s)ds<oo. 
o<s<l5™ J Bm{so , 5) JK 



Proof of Lemma 16.31 By exploiting the fact that So is a Lebesgue point 
of / satisfying /(s ) > 0, we deduce that for some positive So < L 

0< inf — / f(s)ds<oo. 

o<s<s 5™ J Bm(s0tS) JK 

Moreover, 

/ /(s)ds < inf -!- / /(s)ds < 

The quantity on the left-hand side is positive since s belongs to the support 
of /. This concludes the proof. ■ 

Proposition 6.1 Assume that the support of f is compact with diameter 
L > 0. Let Sq be a Lebesgue point of f such that /(so) > 0. Set 



£o = inf — / /(s)ds. 

o<s<l 5- J Bm{S0tS) 



Whenever < £ L m , one has 



N+l 

1. For m = 2, 

■ [4 ^i(i + *(<±l))*# 

2. For 



m /^ + l\ 2/m L 2 " m ^ 



e 2/m (m-2) \N + lJ Um/2-1) N + l' 

Proof of Proposition IBTTJ First note, according to Lemma [6.31 that < 
£o < oo. Next, observe that 

E [ d2 (k N+ i)]= £ w{d {kN+1) >V~5}d5. 
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For some fixed a G (0, L 2 ), we use the decomposition 
J P {d {kN+1) > Vs] dS 

= J P [d {kN+l) >V5}d5 + J P [d (kN+l) > V§} d5 

Introduce 

Po(V5) = [ /(s)ds, 

which is positive since s is in the support of /. Using a binomial argument, 
we see that 



F{d (kN+1) > v^} = • \po(VS)] [l -po(v^) 

j=o V J / 



iV-j 



E 



N 



Po 



By applying technical Lemma I6.4[ we obtain 



(V^) l- P o(V6) 



N-j 



P 



{i (l , +1) >^}<^x-i_. 



Consequently, 



E 



The conclusion is easily obtained by optimizing the right-hand side with 
respect to the parameter a. ■ 

Proposition 6.2 Assume that the support of f is compact with diameter 
L > 0. Let s fre a Lebesgue point of f such that /(s ) > 0. S'et 



£o = inf — 

0«5<L S m In 



»(so,<5) 



/(s)ds. 



Whenever < £ L m , one has 
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1. For m — 4 ; 

.v - i \A k N + 1 

N+l 



E[4„ +1) ]<I( 1 + log(^^±i 



For m^4, 

IP r ,4 1< ™ / fcjy + l \ 4/m L 4 ~ m + 1 

L {fcjv+1)J " £ 4/m (m - 4) V iV + 1 J £ (m/4 - 1) N + 1 

Proof of Proposition 16.21 Proof is similar to the one of Proposition 16. 1[ 
and is therefore omitted. ■ 

Lemma 6.4 For j = 0, . . . , N — 1, let the map fN,j{p) be defined by 



Then, for all i = 1, . . . , N, 



nV +i (i-p)*-', o<p<i. 



i-l 



sup J^VnAp) < Tf— T- 

o< P <i jr Q iv + i 

Proof of Lemma 16.41 Each map (f^j is nonnegative, continuously in- 
creasing on the interval [0, (j + 1)/(N + 1)] and decreasing on [(j +1)/(N + 
1), 1]. Consequently, the supremum of the continuous function X^=o fNjip) 
is achieved at some point p+ of the interval [1 / (N + 1), i/(N + 1)]. That is, 

i-l i-l 

sup V^v,j(p) = y)^ivj(p*) 
o<p<i r— r r-r 

4-1 /aA 

N-j 



o— n \3 J 



j=0 

p * - ivTT 
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A Complements on singular integrals 



Recall that the convolution (Wheeden and Zygmund [311 Chapter 6]) of two 
measurable functions / and g in R™ is defined by 



provided the integral exists. This appendix is devoted to the study of some 
properties of convolution when R n = R ni x R" -2 and g is of the form 



More precisely, the question of interest is to analyze the effect of letting E\ 
and e 2 go independently to in the expression (/ * <£> £l £2 )(x). We prove in 
particular (Theorem lA.lj) that (/★<^ £1)E2 )(x) — >■ /(x) for A n -almost all x if / 
and <p are suitably restricted. 

The issues discussed in the present appendix fall within the field of maximal 
functions and approximation of the identity (Stein [33], Wheeden and Zyg- 
mund [3Z])- The novelty is that we allow the family {f ei ,e 2 : £i > 0,^2 > 0} 
(the so-called approximation of the identity) to depend upon two indepen- 
dent parameters E\ and Si- Interestingly, the real analysis literature offers 
little help with respect to this important question, which is however funda- 
mental in the study of multivariate nonparametric estimates. Valuable ideas 
and comments in this respect are included in Devroye and Krzyzak |9J. 

Let ip be an integrable function on R n = R™ 1 x R™ 2 , termed "the kernel" 
hereafter. It is assumed throughout that tp is a product kernel, of the form 





p(x) = Vi(xi)^ 2 (x 2 ), x = (x 1)X2 ) G R ni x W 



(A.l) 



For Ei > and e 2 > 0, we set 




We will need the following assumption. 
Assumption [K] For i — 1,2, the functions 

ipi(xi)= sup \(fi{yi)\, 

|yHI>ll x H 



Xi G R'' 



are m 



(R ni ), with 
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If / is a locally integrable function in R n , we also denote by M 12 / the asso- 
ciated Hardy-Littlewood maximal function with two degrees of freedom. It 
is defined for x = (xi, x 2 ) by 



(M 12 /)(x) = sup 

ei,£2>0 



1 



Vni^l v n 2 b 2 ./B„ 1 (xi,e 1 ) J B„ 2 (x 2 ,e s ) 



|/(yi,y2)|dyidy 2 



where J B TVl (x 1 ,£i) (respectively, £>„ 2 (x 2 ,£ 2 )) is the closed ball in R™ 1 (respec- 
tively, R n2 ), with center at xi (respectively, x 2 ) and radius E\ (respectively, 
e 2 ), and V ni (respectively, V n2 ) is the volume of the unit ball in R ni (respec- 
tively, R™ 2 ). 

Our objective is to prove the following theorem, which is a more general 
version of Theorem I3.2L 

Theorem A.l Let f be a measurable function in R™ satisfying 

[ |/(x)|(l + log + |/(x)|)dx<oo, (A.2) 

and let f be a product kernel of the form liA.l\) satisfying Assumption [K]. 
Assume, in addition, that 



/ y?(x)dx = I. 



(i) For X n -almost all x e R n , 



lim (/★y 61)6a )(x) = /(x). 

ei,£2->0 

(ii) For X n -almost all x G R n , 

sup |(/*^ eil£2 )(x)| < A(M 12 /)(x) < oo, 

ei,£2>Q 

where C is a constant independent of<p, f and x, and A is the constant 
of Assumption [K]. 

(iii) Moreover, if f is in L q (M. n ), 1 < q < oo, then M^f is in L q (W n ) and 

\\M 12 f\\ q < c q \\f\U, 
where the constant c q depends only on q and the dimension n. 
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Proof of Theorem IA.1I To prove the theorem, we will need some general 
results on singular integrals and Hardy-Littlewood maximal functions. As 
shown in de Guzman [6l page 50], for all a > and a locally integrable /, 

A n ({x G W 1 : (M 12 /)(x) >a})<c [ (l + log+ 1^1) dx, (A.3) 

where c is a constant independent of / and a. This result will be crucial in 
our proof. It easily follows that whenever 

|/(x)|(l + log + |/(x)|)dx<oo, 

then (M 12 /)(x) < oo at A n -almost all x. 

Proof of (zz) The proof follows arguments of Stein [I3j pages 63-64]. For 
% = 1, 2, with a slight abuse of notation, we write , 0i( r i) — ^i( x i) if r i — ll x i II- 
This should cause no confusion since each if>i is anyway radial. Observe that, 
for i = 1,2, 

^(xj)dxj > ipiin) / dxi oc ipi(ri)r?\ 

r i /2<\\x i \\<r i Jr i /2<\\x i \\<r i 

Therefore, the assumption ipi G L 1 (IR ni ) proves that r^ifj^rt) — >• 0, as — > 
or rj oo. 

To prove (ii), it is enough to show that for all nonnegative / satisfying (1A.2I) . 
all Si > 0,e 2 > 0, 

(/*^ 1)ea )(x)<X(M 12 /)(x), (A.4) 

where 

Vfei l6a (x) = ^n^l (j^) ^2 (j^l , X = (X1,X 2 ) G ffi™. 

Set if) = if>iif>2- Since assertion (lA.4j) is clearly translation invariant (with 
respect to /) and also dilatation invariant (with respect to if>), it suffices to 
show that 

(/*V)(0)<A(M 12 /)(0). 

Moreover, recalling (1A.3[) . we may clearly assume that (M 12 /)(0) < oo. For 
% = 1,2, denote by S 1 ™ 1-1 the unit (n, — l)-sphere in M ni and let <7j be the 
corresponding spherical measure. 
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We set as well 



£( ri ,r 2 ) = / /(rix 1 ,r 2 x 2 )d(7i(x 1 )d(72(x2), 

Ai(n,r 2 )=/ ^(ui,r 2 )u? 1_1 dui, 

A(n,r 2 )=/ A^n^K^M^ 
Jo 

Jo Jo 

and will repeatedly use the inequality 

A(n,r 2 ) = / / /(x)dx < K 1 .K 2 rr i r 2 n2 (M 12 /)(0). (A.5) 

JB„, ro.nl JB nn (0.n) 



'B ni (0,ri) JB„ 2 (0,r 2 ) 

With this notation, we have 



(/*^)(0)= / /(x)^(x)dx 

noo 
£(r 1 ,r 2 )^ 1 (r 1 )rr~V 2 (r 2 )r 2 " 2 - 1 dr 1 dr 2 

r-N 2 r /-JVi 

lim / / £(ri, r 2 )^i(ri)r™ 1_1 dri 
A, 



El 

JVl -¥ oo 

E2 -> 
iV 2 — > oo 



■02 (^2)^2 2 Mr 2 . 



Denote by ii(£i,iVi) the integral inside the brackets. We may write, using 
an integration by parts (in the sense of Stieltjes-Lebesgue) 

/•JVl 

Jifci.JVO = / A 1 (r 1 ,r 2 )d(-V'i(r 1 ))+A 1 (7V 1 ,r 2 )Vi(iVi)-A 1 (£ 1 ,r 2 )Vi(£i). 
J £1 



Consequently, 

"JV 2 



/•JV 2 

/ Ji(£i,iVi)^2(r 2 )r 2 " 2 - 1 dr 2 

J £2 

pN 2 /-JVi 

= / / A 1 (r 1 ,r 2 )d(-^ 1 (r 1 ))V 2 (r 2 )r 2 "^ 1 dr 2 

J £ 2 ^£l 

+ / A^TV^r,)^^^)^^)^- 1 ^, 

J £2 

- / A 1 (e 1 ,r 2 )^i(ei)^ 2 (r 2 )r 2 : 2 - 1 dr 2 
J £ 2 



'£2 

Ia + Ib- Ic- 



Each term of the sum is analyzed separately. Using again an integration by 
parts, we are led to 



Ni 



N 2 



A(r 1; r 2 )d (-^(r 2 )) + A(n, N 2 )fo(N 2 ) 



= 2 



A(ri, £2)^2(^2) d(-0i(ri)) 



ATi /.JVa 



A(ri,r 2 )d(-'0i(r 1 ))d(-^2(»"2)) 



£1 ^£ 2 

Ni 



+ 



A(n,JV 2 )^(iV 2 )d(-^i(ri)) 



£1 



A(r 1 ,£ 2 )^ 2 (£ 2 )d(-^i(r 1 )) 
= A 1 + A 2 - A 3 . 

The main term, Ax, is handled as follows via inequality (1A.5j) : 

poo poo 

A\ < V ni .V n2 (Mi 2 f)(0) I I r"V™ 2 d(-^ 1 (r 1 ))d(-^ 2 (r 2 )) 
<A(M 12 /)(0), 
since for i = 1,2, we have 



jo 



rfd (-ipi{ r i)) 



/ ^( x i)dxi < a/A, 



by Assumption [K]. The remaining terms, A 2 and A3, converge to 0. To see 
this, just note that 

POO 

A 2 < V ni .V n2 (M 12 f)(0) x N^ 2 (N 2 ) / r ^d(-^ 1 (r 1 )) , 



which goes to since the integral is convergent and N 2 2 tp 2 (N 2 ) — > as 
N 2 — > 00. Similarly, 

POO 

A 3 < V ni .V n2 (M 12 f)(0) x e^fo) / ^(-^(n)) . 

Jo 

The term on the right-hand side tends to since e r 2 Kt i\) 2 {e 2 ) — > as e 2 — > 0. 

Using similar arguments, it is easy to prove that 1b and Ic go to as £1, £ 2 — > 
and Ni,N 2 — > 00. Proof of (ii) is therefore complete. 
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Proof of (i) For the sake of clarity, the proof is divided into three steps. 

Step 1 If / is continuous and has compact support, then the result is easy 
to verify. Indeed, we have in this case 

(/*¥Wa)(x) = / / /(xi-eiyi,x2-£ 2 y2Myi,y2)dyidy2, 

whence, using the fact that L n y(x)dx = 1, 
|(/*^ li£2 )(x)-/(x)| 

/ / l/(xi -£iyi,x 2 -e 2 y2) - /(x)| . |^(yi,y 2 )|dyidy 2 



< 



< sup |/(xi - eiyi,x 2 - e 2 y 2 ) - /(x)| / / |v3(yi,y 2 )| dyidy 2 . 

xi,x 2 ,yi,y2 



"1 ./TCP" 



Since / is uniformly continuous, this term tends to 0. 

Step 2 We establish that lim ei)£2 ^ Q (/ * <£> ei>e2 )(x) exists for A n -almost all 
x G R™. As for now, to ease the notation, we set <7* 1£2 (x) = (g * <^ £1)£2 )(x), 



and let 



(fi«7)(x) 



lim sup g* (x) - lim inf g* (x) 

£1,£2-H) £l,£2-S>0 



Let a > and 5 > be arbitrary. Thanks to Proposition IA.1I at the end of 
the section, we may write f = h + g, where h is continuous with compact 
support and g is such that 

\9(*)\ / 1 + los+ M^ )dx < 5 . 



a \ a 
By (ii), we have at A n -almost all x, 

(0 5 )(x) < 2A(M 12 g){x). 

Thus, by fTA~3l) . 

A ({x G R n : (%)(x) > 2Aa}) < c / ( 1 + log + ) dx 



a \ a 

< c5. 

Clearly, Qf < Qg + Qh and, by Step 1, Qh = 0. Therefore 
A ({x G R n : (fi/)(x) > 2Aa}) < c5. 



41 



Since a and 5 can be taken arbitrarily, we conclude that 

A({xeM":(ft/)(x)>0}) = 0. 



Step 3 We finally prove that, for A n -almost all x e M n , 

ei,£2-»-0 ' 

Set /i(x) = lim ei)£2 ^.o /* lE2 ( x ) (this limit exists A„-almost everywhere by 
Step 2). Fix a > 0, 5 > 0, and choose /t continuous with compact support 
as in Step 2 such that 

|(/- ft )(x)| ^ + log + !(/ -'»)(*)! |dx <, 



a \ a 
For A n -almost all xGR", 

IfW-frMl^lfM-hWl + l Km ^ ii£2 (x)- lim /*, ea (x)| 
= A + A 2 . 

By (ii), 

A 2 < sup |(/-/ i ): ii£2 (x)|<A(M 12 |/-/ i |)(x). 

ei,£2>0 

Thus, 

A({xer:|/(x)-/ 1 (x)|>2ia}) 
<A({xGM": |/(x) - h(x)\ > Aa}) 
+ A({x6r: (M 12 |/-/i|)(x) >«}) 

<^ + c / Mf 1+log+ E^M) dx 

In the second inequality, we used Markov's inequality together with inequality 
(1A.3j) . Since both a and 5 can be chosen arbitrarily, we conclude that 

A({xgR b :|/(x)-/ 1 (x)|>0}) = 0. 
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Proof of (Hi) The proof is adapted from Zygmund [4"5l page 307]. Let the 
partial maximal functions be defined for x = (xi,x 2 ) by 



(M 1 /)(x) = sup 

£l>0 



and 



(M 2 /)(x) 



sup 

£2>0 



v n 2 t-2 Jt3„ 2 (x 2 ,e2) 



|/(yi,x 2 )|dyi 



|/(xi,y 2 )|dy 2 



From these definitions, it is clear that 

(M 12 /)(x) < (M 1 (M 2 /))(x). 

But, for 1 < q < oo, f\ G L q (R ni ), f 2 G L q (R n2 ), it is known (see, e.g., Stein 
Theorem 1, page 5]), that 



llMi/H^dJlAH, and \\M 2 f\\ q <c 2 Jf 2 \\ q , 

where the constants Ci j(? and c 2tQ depend only on m, n 2 and q. It immediately 
follows that 

\\M 12 f\\ 9 q < cl q 4Jf\\l. 
This concludes the proof of the theorem. ■ 

Proposition A.l Let $ : R + — > R + be a continuous and nondecreasing 
function satisfying $(0) = ; and let f be a measurable function from R n to 
R such that 

P $(|/(x)|)dx<oo. 

Then, for all 5 > 0, there exists a function h continuous with compact support 
such that 



I $ (|/(x) - /i(x)|)dx < 5. 
Jr™ 



Proof of Proposition IA.1I First, assume that /(x) > for all x. Take 
{ft} a sequence of nonnegative continuous functions, each with compact sup- 
port and such that < /t(x) f /(x) at A n -almost all x G R n . For such an 
x, by the continuity of $ at 0, one has $(/(x) — /t(x)) — > $(0) = 0. Since 
$(/(x) — /t(x)) < $(/(x)) and <&(/) is in L 1 (R n ) by assumption, we may 
apply Lebesgue's dominated convergence theorem and conclude that 

/ $ (/(x) - /t(x)) dx as t ->• oo. 

Jr™ 

If we drop the assumption that /(x) > 0, we may split / into positive and 
negative part and apply the above result. ■ 
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